Character-Level Dependencies in Chinese: Usefulness and Learning

نویسنده

  • Hai Zhao
چکیده

We investigate the possibility of exploiting character-based dependency for Chinese information processing. As Chinese text is made up of character sequences rather than word sequences, word in Chinese is not so natural a concept as in English, nor is word easy to be defined without argument for such a language. Therefore we propose a character-level dependency scheme to represent primary linguistic relationships within a Chinese sentence. The usefulness of character dependencies are verified through two specialized dependency parsing tasks. The first is to handle trivial character dependencies that are equally transformed from traditional word boundaries. The second furthermore considers the case that annotated internal character dependencies inside a word are involved. Both of these results from character-level dependency parsing are positive. This study provides an alternative way to formularize basic characterand word-level representation for Chinese.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chinese Morphological Analysis with Character-level POS Tagging

The focus of recent studies on Chinese word segmentation, part-of-speech (POS) tagging and parsing has been shifting from words to characters. However, existing methods have not yet fully utilized the potentials of Chinese characters. In this paper, we investigate the usefulness of character-level part-of-speech in the task of Chinese morphological analysis. We propose the first tagset designed...

متن کامل

Cascade Markov random fields for stroke extraction of Chinese characters

Extracting perceptually meaningful strokes plays an essential role in modeling structures of handwritten Chinese characters for accurate character recognition. This paper proposes a cascade Markov random field (MRF) model that combines Preprint submitted to Elsevier 29 September 2009 both bottom-up (BU) and top-down (TD) processes for stroke extraction. In the lowlevel stroke segmentation proce...

متن کامل

Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition

Recently, great progress has been made for online handwritten Chinese character recognition due to the emergence of deep learning techniques. However, previous research mostly treated each Chinese character as one class without explicitly considering its inherent structure, namely the radical components with complicated geometry. In this study, we propose a novel trajectory-based radical analys...

متن کامل

Dual Long Short-Term Memory Networks for Sub-Character Representation Learning

Characters have commonly been regarded as the minimal processing unit in Natural Language Processing (NLP). But many non-latin languages have hieroglyphic writing systems, involving a big alphabet with thousands or millions of characters. Each character is composed of even smaller parts, which are often ignored by the previous work. In this paper, we propose a novel architecture employing two s...

متن کامل

Character reading fluency, word segmentation accuracy, and reading comprehension in L2 Chinese

This study investigated the relationships between lower-level processing and general reading comprehension among adult L2 (second-language) beginning learners of Chinese, in both target and non–target language learning environments. Lower-level processing in Chinese reading includes the factors of character-naming accuracy, character-naming speed, and word segmentation accuracy. The results of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009